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Abstract 

A challenge in multivariate problems with discrete structures is the inclusion of prior 
information that may differ in each separate structure. A particular example of this is seismic 
amplitude versus angle (AVA) inversion to elastic parameters, where the discrete structures 
are geologic layers. Recently the use of systems of linear stocastic partial differential equations 
(SPDEs) have become a popular tool for specifying priors in latent Gaussian models. This 
approach allows for flexible incorporation of nonstationarity and anisotropy in the prior 
model. Another advantage is that the prior field is Markovian and therefore the precision 
matrix is very sparse, introducing huge computational and memory benefits. We present a 
novel approach for parametrising correlations that differ in the different discrete structures, 
and additionally a geodesic blending approach for quantifying fuzziness of interfaces between 
the structures. 
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1 Introduction 



In spatial statistics, the need for specifying different behaviour in different regions in space is 
crucial for making a good prior model. The litterature is abundant with methodologies for 
this. In the multivariate setting, this generalises to having different correlations between the 
fields in different regions and different cross-differentiability properties. 

A particular model problem where this is important is the seismic AVA inversion problem, 
which well studied in the geophysical litterature. There are several incarnations of this 
problem with varying degrees of complexity. In this article, our primary example is the 



inversion problem studied in Buland and Omre (2003); Buland et al. (2003); Rabben et al 



(2008), using the wavefield propagation approximations in Aki and Richards (1980), which 



results in linear systems of equations to solve. Variants and extensions of these equations are 



found in |Stovas and Ursin| ( |2003[ ), including nonlinear approximations that may yield better 
inversion results in some situations. We exemplify our contributions using this example 
explicitly throughout the article. 



The model we adopt in this text is the same as in Buland and Omre (2003), which is 
essentially 



d(s) — w * rpp(m)(s) + e, 



(i) 



where * denotes convolution in time, w is an approximation to the source wavelet - i.e. 
the shape of the wave traveling through the subsurface, and rpp(m) denotes a reflectivity 
operator. The reflectivity operator takes relative differences in elastic parameters to reflection 



coefficients for the wave. We adopt the following elastic parameters, 

Av P Av s Ap 

mi — — , m 2 , ra 3 — . (2) 

v P v s p 

I.e. mi,m2,m3 denotes the relative difference of P- velocity (pressure wave velocity), S- 
velocity (shear wave velocity) and density respectively, and the reflectivity operator is defined 
by 

rpp(O) = ^(1 + tan 2 0) - 4m 27 2 sin 2 + ^(1 - 4 7 2 sin 2 0), (3) 

there is the reflection angle and 7 2 denotes a background (vs/vp) -ratio. Rewriting this in 
matrix notation yields 

d = WAm + e, (4) 

where d are observations, W is the discretized wavelet operator, A, the discretized reflectivity 
operator, m the elastic parameters and e ~ A/"(0, a 2 1) an error term which is often assumed 
to be normally distributed. 

In this text, we will explore a novel method for designing a good prior for m using linear 
systems of stochastic partial differential equations. We emphasize, however, that while the 
approach developed here is designed with seismic AVA inversion in mind, it is very flexible 
and can be adopted in any setting where we have multivariate fields with separate regions 
where we would like to incorporate prior information. 

All the figures that appear in this text have comparative scales, so that the colour schemes 
have the same min-max values in each individual figure. Hence, the figures makes sense, 
without cluttering them with additional colourbars. 



2 Prior specification 

The choice of prior in the inversion problem is of great importance when it comes to the 
performance of the inversion. It is vital to choose a "good" prior to emphasise the properties 
of m that we know it has. For us, m will denote the parameters of interest, and it depends 
on position. We construct the prior by combining heuristics and expert knowledge of the 
spatial model. For a Gaussian prior model, the standard way of specifying the prior model is 



through the covariance function, which is often assumed to be stationary (see, e.g. Buland 
and Omre| (|2003[)). A stationary covariance function is defined by a correlation function 



that defines how much a point is correlated with its neighbours and a marginal variance 
parameter, g 2 through 

e 2 c(l|x-y||A) = Cov(x,y), ( 5 ) 

where A is a positive definite matrix that defines the non-changing anisotropy of the field. 
In the Gaussian case, this defines a strictly stationary process if the mean is constant. There 
is a list of widely used covariance functions in Cressie (1993). We will throughout this text 



assume that the prior is from the Gaussian family. This family is defined by having density 

p(x|Q, n x ) = (2tt)-"/ 2 det(Q) 1 / 2 exp Q(x - ^) T Q(x - , (6) 

where Q = X) -1 is the precision matrix - the inverse of the covariance matrix 5] - and \i x is 
the expectation, E(x|/x iC ). 

Moreover, the fields mi, 777,2, are assumed correlated with correlations specified by well 
data and/or other local knowledge. In the discretized domain, this allows for the following 
decomposition of the total covariance matrix 

= ^space ® 5]q (?) 
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where 5] space denotes the spatial covariance matrix, typically defined through a covariance 
function, and 5]q the- correlations between the elastic parameters. Since seismic observations 
typically are on a regular grid, either in 2-D or 3-D, it possible to let 5] space be circulant by 
extending the grid by as many points as is needed to get the correlation below a threshold - 
typically 0.1 or 0.05. This allows us to use fast Fourier transforms for computing quantities 
of interest related to the covariance matrix. This, together with the Kronecker structure 
of S m allows for fast computations. See Buland et al. (2003); Rue and Held (2005); Gray 



(2006) for details. This approach also has very low memory requirements; since 5] space is 



circulant it may be stored using only one vector. Hence storage is 0(n) and computations 
(of any kind) are at most O(nlogn), where n is the number of nodes in the extended lattice. 



2.1 SPDE formulation 

While this decomposition is sensible, it is also very inflexible and requires stationarity for low 
storage requirements. Another way of pursuing good prior models with fast computations 
and low memory requirements is through the use of elliptic (pseudo) differential operators 
(Ruzhansky and Turunen (2009), part 2 is an accessible source). The theory of pseudo 
differential operators is closely related to Weyl transforms and short-time Fourier transforms 



or Gabor transforms (Feichtinger et al. (2008)) and usual spectral considerations is seismology 
apply. In this approach, it is the sparsity of the resulting precision matrices that makes 



storage and computation manageable. Recently, |Lindgren et al.| ( |2011[ ), studied how to apply 
such operators in a statistical setting. They studied a Riesz-Bessel operator, (— A + ft) a / 2 



and its relation to computation and Matern covariance models (Matern I960) [Whittle 1963). 
The main lessons are firstly, if 



M Kia x(s) := (n 2 - A) a ^ 2 x(s) = W(s), 



(8) 



where W is spatial Gaussian white noise, then x(s) has Matern type covariance function, i.e., 



p(r) 



T(a - d/2)2 a ~ d / 2 - 
2 _ r(a - d/2) 

r(a)(47r) d / 2 K 2 («- d / 2 )' 



-( K r) a - d/2 K a 



-d/2 



(nr), 



(9) 
(10) 



where K s is the modified Bessel function of the first kind. Secondly, fast computations 
through finite element methods or other discretisations of the differential operator in ^ are 
available through the induced Markov properties of the discretisation matrix, Q 



space- 



That 

essentially means that Q m = Q spa ce ® Qo is ( ver y) sparse and with a structure ameanable to 
Cholesky factorisation. An alternative requirement is that we can construct the matrix vector 
product Q m v and det(Q m ) relatively quickly through some iterative or direct procedure, 
see |Simpson| fl2008| ); [Aune et al.| fl2012a|b| ) 

When addressing the "stationarity" of the field defined by (J8|, it is only stationary in 
the sense of (|5| if it is defined on the whole of R fe , where k — 2, 3 in our case - alternatively 
when the corresponding operator is defined on a manifold without boundary. In our case the 
domain on which ([5| is defined is merely a subset, namely a square or box in R 2 or R 3 . Hence 
boundary effects resulting from boundary conditions may destroy its direct interpretability in 
terms of this equation. It is, of course, possible to specify boundary conditions in such a way 
that you retain the property in (|5|, but usually there are more natural physical boundary 
conditions that in our opinion improves upon the specification through SPDEs compared to 
the model defined by covariance matrices through stationary covariance functions also in the 
stationary case. 

There are two properties that are desirable to have in the prior model in AVA inversion. 
The first is being able to have different correlation length at different points in space. If a 
geologist have sound reasons to believe that a layer is very inhomogeneous, it may warrant 
putting a lower correlation length here than in a layer that is thought to be very homogeneous 
with very similar properties. Facilitating this is trivial - one merely lets k 2 = tv 2 (s) vary with 
space. The other property that is very desirable to have is anisotropy. Letting the correlation 
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Figure 1: Realisations from stationary model given by (|8| (left) and non- stationary model given by (11) 
(right). The non- stationary model has a curved interface, and the field below the interface has anisotropy 
directed along the curve, while above the interface it is almost isotropic. 




length vary with direction is very natural given that the layers are typically not flat but are 
deformed in a specific way. The SPDE resulting is the following variant of ([8|: 

M K , a , s x(s) = (« 2 (s) - V • K(s)V) a/2 x(s) = W(s), (11) 

where A is a 3 x 3 symmetric positive definite matrix defining the anisotropy angle and 
principal correlation length in the three directions defined by the eigenvectors of the matrix. 
Realisations of the stationary model and the nonstationary model is given in Figure [I] Here 
we have illustrated the "layer" flexibility mentioned above, where the top layer is isotropic, 
and the bottom layer is anisotropic with deformation defined by the layer. 

To see how this relates to the usual approach, consider Qo = X)q 1 and say that mi, m2, ms 
have equal Matern covariance models (this includes the widely used exponential and Gaussian 
models), then the prior given as in ^ is given by the following system of stochastic differential 
equations: 

(M w ^Q )m = W (12) 

where W is vector Gaussian spatial white noise. The experience in AVA-inversion is that 
at least one component of m worse resolved than the others, with m x being resolved the 



best (see Rabben et al. (2008) or any other article treating this problem). The obvious next 



question then is whether or not ( 12 ) specifies the best way of lending strength to the least 



resolved parameters. If not, can we find better operators on the diagonal in (12), and/or 
replace the off-diagonals with other operators that have better properties in the inversion 
problem? The answer to this question is not obvious, but we investigate some alternatives 
and see how they perform in our inversion problem; the criterion for a better prior in the 
synthetic case being that E((m tr ue — m e?t w ) 2 ) < E((m trU e - m e!?t se ) 2 )> where m^ t se is given 
by the prior model 



It is possible to replace the operator M^ a in (12) by more general pseudo-differential 
operators. Representations of such operators in terms of its symbol are given by 

(K a f)(x)= [ a(x,0me 2 ™M, (13) 

where / is the Fourier transform of /, and a is the symbol of the operator. The symbol can be 
interpreted as defining the local spectrum of the operator. A deep theorem given in |Rozanov| 



(1977) states that a stationary random field is Markov (in the continuous sense) if and only if 
a~ L is a symmetric positive polynomial. Hence Markov fields are represented by differential 
operators. Now, if the field in question is not Markov, it is possible to approximate a by 
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a rational approximation, <j(x,^) _1 « a 7ati( x i€) ~ Xlj=o a i( x )(^ 7r ^0 J - To find the a^s one 
can, for instance, use optimisation techniques. This is one way to do it, but we suspect that 
the time-frequency localisation of such an approach may be suboptimal, and discretization 
of the non-Markov operator may be better suited for time-frequency compressing approaches 
inducing approximate Markovity. We do not pursue these type of ideas here, but mention 
them as they are good candidates for future research. 



3 Systems of SPDEs - generalising " Q " 

It is easy to write the form the generalised approach must have. First, for z, j = 1, . . . , 3, let 

Kn = qij (s)( Kij (s) - V • A ij (s)V)"^ 2 (14) 
and define the following system of SPDEs 



K u K 12 K 13 
Km(s) = ( K 12 K 22 K 23 | m(s) = W(s) 

Kis K23 K 33 



(15) 



For qij(s) = Q^- and Kij = M K a we recover the structure in the previous section with 
stationarity. For convenience, we call % (s) the blending coefficients. In Hu et al. (20121), they 



study the properties of this model in the stationary case, and give the link to the multivariate 

defines a valid Gaussian Markov 
In our treatment, we 



Matern fields in Gneiting et al. (2010). Any choice of K{ 3 
random field, both in the continuous sense and when discretized. 



restrict ourselves to the case where a^- 



a, A^j(s) = A(s) and a%-(s) = ft(s). 



3.1 Parametrising the blending coefficients 

In general, it is both hard to interpret a local precision matrix, Qo(s) = {qij(&)}ij defining 
how the individual parts of the multivariate fields is related to each other at position s, and to 
ensure that this matrix is positive definite. It is much more natural to work with the inverse, 
namely the correlation matrix defining the local correlation of the fields, Eq(s) = Qo 1 ( s )- 
The qij(s) is then simply given by the corresponding matrix elements. In the AVA inversion 
problem, information about correlation in different layers may come from geologists or 
geophysicists for who may know of phase changes when going from one layer to another in 
the different layers, or other, more complex phenomena. It may also come from well-logs 
that may contain information about such matters. 

Suppose that E (s) = E ,i for s e Si C R d and E ,2 for s e S 2 C R d . Then we have 
a model that has specific correlations in one spatial region of the multivariate fields, and 
different correlations in another spatial region. There is obviously a transition between these 
two states. If the transition is discontinuous, this may be seen as a discontinuity of the 
correlations in the realisation of the multivariate random field, which may make sense in 
some situations. 

In order to visualise what this means, we give realisations of the four major prior models 
we have discussed. In Figure |2j no prior information about the geometry of the subsurface 
can be included. In Figure [3] geometric information has been incorporated, but no change in 
the correlation between the parameters in space can be included. In Figure |4j an example 
realisation from the full model is given. Pay attention to the rightmost field - here the 
correlation to the other two fields changes from being positive in the top layer to being 
negative in the bottom layer. 



3.2 Geodesic blending 

There are obviously many ways of making a smooth transition between Eq 5 i and Eq,2 7 but 
one key consideration is that So( s ) must remain positive definite for all s in some transition 
domain St- One thing is certain - it is not necessarily enough to let the off-diagonals element 
in Eo,i change linearly in R 3 to the corresponding off-diagonal elements in Eq,2- 
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Figure 2: Stationary model given by (Fn). The field looks the same wherever we are. 




Figure 3: Nonstationary model with fixed Qo- Here the bottom layer has anisotropy along the curve of 
the interface, and the correlation between the fields is fixed through space. 




Figure 4: Full nonstationary model with varying % (s) according to (20). The bottom layer has anisotropy 
along the curved interface, and the correlation between the fields changes between interfaces. In particular, 
the rightmost field is positively correlated to the others above the interface and negatively correlated below 
the interface. 
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A very natural way of making such a transition between J^o,i an d So, 2 is by considering 
geodesies on the manifold of symmetric positive definite matrices, denoted P^. The natural 
metric on this space has a reasonable statistical interpretation, closely related to information 
entropy and Kullback-Leibler divergence, and an accessible account for the theory is given in 
Bhatia (2007). Different treatments are given in ara et al. 1996 Hiai and Petz, 2009). 



For completeness, we give a small account of the definition and properties we need related to 



this manifold. This exposition is based on Hiai and Petz| ( 2009 ) ; Bhatia (2007). 



The Boltzmann entropy of the Gaussian distribution 
is given by 

1 



tz]( [2009 
^§,defir 



defining an information potential, 



B(p(x|Q,M*)) = B(Q) = -logdetS 



■C, 



(16) 



where C is an arbitrary constant and 5] = Q 1 is any positive definite matrix. The 
Riemannian metric based on this information potential is the Hessian 



d 2 



B(Q + sU + tM) = tr QHQK, 



s=0,t=0 



and where H, S G §d, the tangent space of symmetric matrices, E>d = {V G R dX(i |V 
This defines the line element 



ds 



(tr [(Q-^dQQ" 1 



/2n2 



1/2 



Hence, if we have a curve in P^, i.e. 7 : [a, b] — ^ P^, its length can be calculated as 

Cb , r nx 1/2 



(7)= / (tr[(7W 1/2 yW7W 1/2 ) 2 l) 

J a 



dt 



(17) 

v T }. 

(18) 
(19) 



A nice property that follows from this is that lengths of curves are invariant under congruence 
transformations. That is, if g(t) = X T 7(£)X, L(j) = L(g). The geodesic, the curve with 
minimal length, between two matrices, A and A can from this be deduced to be 



g A , B (t) = A# t B = A 1 / 2 (A-V'BA- 1 /*)* A 1 / 2 , t e [0, 1]. 



(20) 



Obviously, g AB (0) = A and g AB (l) = B. It is this curve we use when we go from 
A = Qo,i = to B = Q ,2 = ^0,2 m different discrete structures in our prior model, 
and this ensures that we are within the realm of positive definite matrices in a natural way. 
Noting that (A^^B) -1 = A _1 #tB _1 , we see that it is unproblematic to work with precision 
matrices rather than covariance matrices. Integrating g A B (t) yields the distance between 
the two matrices, 



d ¥d (A,B) = j g A>B (i) = (tr [(logA-^BA" 1 / 2 ) 2 " 



1/2 



(21) 



A potential drawback of using this strategy is that if Qo,i, Qo,2 are correlation matrices, 
and what you want is a continuum of correlation matrices, gg o q o 2 (t) are not correlation 
matrices for t G (0, 1). It is possible to correct for this by using geodesies on the submanifold 
of correlation matrices in P^. In practice, however, gQ 01 ,Q 02 (£) are ver Y close to being 
correlation matrices in most cases. We do not have any counterexamples. 



3.2.1 Fuzzy interfaces 

In some situations, we may actually have a hard interface in our multivariate field, but even 
in this situation, experts may place the interface incorrectly, which may lead to imprecise 
interpretation of the field. The geodesic blending strategy discussed in the previous section 
gives us a way to handle this situation in a specific way: the blending range may serve as 
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Figure 5: Illustration of blend range. Guessed interface (left), true interface (right), blend range 
illustrated in grey (bottom) 





quantifying the uncertainty or fuzziness of this interface. This range may then be estimated 
based on realisations of the field, possibly requiring a strong prior for identifiability. 

To illustrate this, suppose that an expert says that the interface is as in the upper left 
part of Figure [5j while the real interface is given on the right. The bottom illustration in 
Figure |5] shows what the geodesic range should be in this case (grey area) - it should cover 
the true interface properly, showing that there actually is a fair amount of uncertainty in the 
placement of the interface. In Section [2J we investigate whether this range may be estimated 
purely from data or if a strong prior on the range is needed. It is, of course, possible to 
combine this idea with procedures for actually estimating the interface, but, as always, this 
increases the complexity of the model that is to be estimated. Additionally, the blend range 
may easily confound with potential parameters needed to estimate the actual location of the 
interface. 

3.3 Modified parametrisations 

There are many ways to modify the parametrisation described above to reduce parametrisation 
demand or incorporate different flexibility. A possible way to reduce the parametrisation 
demand further is to do the modeling in the Cholesky domain. This is a simplification, but 
it is one we believe should increase interpret ability and possibly estimation properties. To 
motivate this approach, consider the following: Suppose that the Cholesky factorisation of Qo 
is given by Qo = LoLq , and that Q spa ce = Bf(Q|)*, for some matrices Bf,B2- Generating 
the matrices {Bf }i 5 2 can for instance be done by using a s = a/2 in ([8| and discretizing this 
operator, but there exist many other factorisations that may behave in better way for the 
problem at hand. By a Kronecker product identity, Q spa ce ® Qo = (Bf <S> Lo)((B|)* (8) L^)- 
The intuition stemming from this identity carries over to the more general case in a natural 
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way: Let hj(s) be entry z, j of the Cholesky factor of the matrix {qij(s)}ij locally, and define 



locally operators that will correspond to some square root of its original form in (14). It is 



possible to define the operators in such a way that we get back (15), but this is of minor 



concern in practice as long as we get the interpretability we want. This is remniscient to the 



triangular approach mentioned in Hu et al. (2012). 



4 Parameter estimation and conditional expectation 

In order to show that our proposed model is useful with confidence in the realm of seismic 
AVA inversion, we must show that estimation of hyper-parameters in the prior model is 
feasible and that the conditional expectation, E(m|d, 0), is better than in the simpler model. 
A natural way to see if the hyper-parameters are identifiable is to simulate from the prior 
fields and do maximum likelihood estimates on these. If this works well, one may go one 
level higher and assume noisy observations of the form 



y WAx + e, 



(22) 



where W denotes a convolution matrix defined by the wavelet, and A denotes the reflectivity 
matrix, and e ~ A/"(0,I). It is also here possible to do maximum likelihood estimates. For 
more information on estimating this type of model, consult your favourite treatise that 
discusses latent (Gaussian) models for, e.g. Rasmussen and Wiliams (2QQ6[); Rue and Held 



(2005); Cressie and Wikle (2011). In treating this estimation problem, we use the simpler 



anisotropic model where the correlation changes from positive to negative at an interface 
defined by a straight line. It is, of course, possible to estimate the geometry as well, but this 
is beyond the scope of this text. 

When estimating the g^-, supposing it changes between layers, we must impose constraints 
to enforce the interpretability we want - namely that of its local inverse being the correlation 
matrix of the multivariate field at that point. Now, the matrix consisting of ones on its 
diagonal with three free parameters off its diagonal uniquely specifies these constraints 
through its eigenvalues: they must all be greater than zero. Hence we have three constraints, 
depending only on the off-diagonal elements of the local correlation matrix. The same type 
of restriction would apply if we were to use general local 3x3 covariances instead. In that 
scenario, however, the three constraints would depend on six parameters instead of three. In 
this section, we will denote the different models as follows 

1. Model 1 is the simple stationary Qo (8) Qspace as i n @ 

2. Model 2 is stationary in space using the extended % (s) parametrisation as in Section 



3.1 , equation (15), using interpretability constraints 



3. Model 3 is nonstationary in space and using the extended % (s) parametrisation as in 
Section 3.1, equation (15). Additionally, we use a blending of two correlation matrices 
at the interface, so that the correlation change is not discontinuous. 



4.1 Identifiability 

We show that the parameters in So, Si are identifiable through simulation. To do this, we 
simulate from many multivariate fields and estimate the parameters by maximum likelihood. 
If the estimated maximum likelihood density - which is estimated from several realisations - 
is unimodal, the parameters are identifiable. Suppose that XIq, Si are given by 



1 




1 



Si 




(23) 



using ft 2 = 0.1, 



and t^Q, with r 2 = 50. Using 200 realisations from the field, we get 
maximum likelihood density estimates for the parameters - these are illustrated in Figure 
[6j For obtaining the parameters, we used a quasi- Newton method with initial correlation 
parameters being zero. Judging from this figure, since all density estimates are unimodal, all 
parameters seem to be identifiable. 
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Figure 6: Maximum likelihood density estimates for correlation parameters, n 2 and r 2 using direct 
observations of 200 fields 




Figure 7 : Maximum likelihood density estimates for correlation parameters, n 2 and r 2 using indirect 
observations of 600 fields 
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In the case where we have noisy observations, we use profile likelihood to estimate the 
noise level, a 2 , and a quasi-Newton method to estimate A 2 
parameters. In this case we used the correlation matrices 



T 2 <7 2 , K 2 and the correlation 



1 



So 




1 



Si 




(24) 



In Figure [7] the corresponding estimates for a hidden field is given. Of course, it is much 
more difficult in this situation, which is reflected through the broad distributional tails in 
the figure. Overall, however, the estimates seem to recover the true values quite well. One 
odd observation is the bimodality of E (2,3). We believe it may come from observing rather 
small fields, from a 64 x 64-grid, and that it may go away for larger ones. The values over one 
on the left part of the figure are artefacts coming from using a kernel smoother for estimating 
the density. 



4.2 Conditional expectation 

The real test on whether it is wise or not to use this advanced parametrisation of the 
model is essentially the reconstruction problem: based on noisy observations, are we able to 
reconstruct the original fields with higher fidelity? 

In the following subsections, we give several reconstructions examples, and we use two 
observations schemes. The first one is based on identity observations with i.i.d. noise, and 
the second is based on the AVA model, giving the observation matrix WA followed by i.i.d. 
noise. 
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Figure 8: Kriging for identity observations with signal-to-noise ratio 1/50. True parameters (left), 
kriging using true Model 2 (center), using Model 1 (right) 




4.2.1 Identity observations 

First, we present a reconstruction example where the observation matrix is the identity, 
followed by iid. noise, and with two signal-to-noise ratios. One with A 2 = 50 and one with 
A 2 = 0.5. For these two models, we use the following true correlation matrices 

1 0.99 0.99 \ / 1 

£ = I 1 0.99 , Ei = 1 0.99 I . (25) 





In Figure [HJ we illustrate reconstruction of the first of the three fields with signal-to-noise 
ratio 1/50, using a flat interface and identity observations. I.e. the field true field is generated 
by Model 2, followed by i.i.d. noise. A priori we believe one of the worst situations for 
estimating Model 1, as correlations change very much from structure to structure and the 
noise level is very high. The likelihood function in the situation with high noise levels 
appears very flat, requiring high accuracy and many iterations in the optimisation scheme 
to give consistent estimates. For A 2 = 50, ||E M2 (x|y, 0) - x|| 2 /||x||2 = 0.526 for Model 
2 and ||E M1 (x|y,0) - x|| 2 /||x|| 2 = 0.674 for Model 1. The first field is chosen, as for the 
correlation matrices defined for this, the first field is the one with changing correlation 
between interfaces, relative to the others. The main effect we see in this comparison is that 
the level of the reconstructed field using the Model 1 does not completely reach up to the 
true levels - we believe this can be attributed to a flattening effect arising from the sum of 
the two correlations in the different layers being zero. 

In the second comparison we generate fields and observations from Model 2, with A 2 =0.5. 
Here a different effect is more prominent - we see that the reconstructed field on the right in 
Figure [9| i.e. the reconstruction based on using Model 1, is smoother and does not exhibit 
as much of the jaggedy effect of the true surface compared to the field in the middle. A 
consistent property when estimating the Model 1 is that n 2 seems to be underestimates, 
leading to a larger range and hence smoother reconstruction. One may think that this 
smoothing effect of the field on the right in Figure [9j but for comparison, we also include 
reconstructions of the second field, depicted in Figure [l0| Here the mentioned smoothing 



effect is not as present as in Figure [9 
the changing correlations. In Figure 



Hence, we believe that this is an effect induced by 
9] ||E M2 (x|y,0) -x|| 2 /||x|| 2 = 0.179 for the middle 
reconstruction, and ||E M1 (x|y,0) — x|| 2 /||x|| 2 = 0.269 for the rightmost one, while in Figure 
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||E M2 (x|y, 6) - x|| 2 /||x|| 2 = 0.168 and ||E M1 (x|y, 6) - x|| 2 /||x|| 2 = 0.195. 



4.2.2 AVA observations 

While the results using the identity observations are convincing in the extended models' 
favour, we also need to investigate the effects where the observation matrix is the seismic 
AVA model. In this case, the true fields are generated by Model 2, and the observations are 
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Figure 9: Kriging for identity observations with signal-to-noise ratio 1/0.5, field 1. True parameters 
(left), kriging using Model 2 (center), using Model 1 (right) 




Figure 10: Kriging for identity observations with signal-to-noise ration 1/0.5, field 2. True parameters 
(left), using true model (center), using model defined by (12) (right) 
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Figure 11: Observations using identity observations (middle) and the seismic AVA model (bottom) 
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linear combinations of the various fields at each space location followed by a convolution 
with a smooth wavelet and i.i.d noise. We use A 2 = 0.5 and A 2 = 20 in these examples. 



In Figure [TT] we can see the observations that are generated by this process. A key 
feature in the observations is that there occurs some cancellation, resulting from the fact that 
they are linear combinations of the underlying fields. This results in varying signal-to-noise 
ratios depending on the varying correlations. 

Reconstructing the original multivariate field using the AVA observation scheme is more 
difficult than for using the identity observation matrix. The aforementioned cancellation 
effect is a major contributor to this. Additionally, there does not seem to be a straightforward 
way of interpreting the estimated correlation parameters coming from Model 1. In Figure 



12 , we illustrate the true parameters on the left, with reconstruction using the Model 2 the 
middle and the Model 1 on the right, using a signal-to- noise ratio of 1/0.5. The effects we see 
are reminiscent of those using identity observations, but the smoothing effect is not present 
here. In this case ||E M2 (x|y, 0) - x|| 2 /||x|| 2 = 0.461, while ||E M1 (x|y, 0) - x|| 2 /||x|| = 0.728. 
Reconstruction using the same model, with a signal-to- noise ratio 1/20 is depicted in Figure 



13 No smoothing effect relative to Model 2 is observed here, but predictions are worse using 
Model 1, having ||E M2 (x|y, 0) - x|| 2 /||x|| 2 = 0.762, while ||E M1 (x|y, 0) - x|| 2 /||x|| = 0.863. 

4.2.3 Identity observations and non-stationarity 

Until this point, we have only studied the effects coming from changing correlations between 
interfaces. The model we have described is much richer than that, providing a flexible way 
of specifying anisotropy that moves along geometry of the subsurface. In this situation, we 
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Figure 12: Reconstructed field using the AVA model with signal-to-noise ratio 1/0.5. True parameters 
(left), kriging using Model 2 (centre), using Model 1 (right). 




Figure 13: Reconstructed field using the AVA model with signal-to-noise ratio 1/20. True parameters 
(left), kriging using Model 2 (centre), using Model 1 (right). 
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Figure 14: Kriging for identity observations with signal-to-noise ration 1/0. 2, field 1. True parameters 
(left), kriging using Model 3 (center), Model 1 (right) 




expect the results to be even more convincing, and we provide one example to cover this 
situation as well. In this case, we generate the true field by using Model 3, and we have 
identity observations followed by i.i.d. noise. The realisations of the true fields are then 
similar to the one in Figure [4j and we estimate both the simple and complex model for 
thereafter giving a reconstruction of the latent field. In Figure [TIJ we see the reconstructions 
using Model 3 (center) and Model 1 (right), and the most prominent effect we see is the 
smoothness differences in the bottom layer. Reconstruction using Model 1 is rugged and does 
not capture the anisotropy of the layer at all, contrasting the reconstruction using Model 3. 
On the top layer, on the other hand, the reconstructions are more comparable. The relative 
errors are ||E M3 (x|y, 6) - x|| 2 /||x|| 2 = 0.280 (center) and ||E M1 (x|y, 6) - x|| 2 /||x|| 2 = 0.382 
(right) for the reconstructions - i.e. predictions are about 37% better using the true model. 



4.2.4 Reconstruction — final remarks 

It is also important to note that if we simulate from the simple model, the parameters 
here are recovered well by using the parametrisation in Section |3Tj This means that the 
correlations estimated by the simple model are close to the ones estimated by the more 
complex model. This, of course, adds to the usefulness of the model in situations where we 
do not know in advance that the correlation changes between interfaces. The uncertainty, 
however, is greater, leading to more disparate estimates of the correlations than when using 
the simple parametrisation. 



4.3 Estimating the blend range for fuzzy interfaces 

In this section, we will treat all parameter except the blend range as fixed. The model we 
will treat is one where the true interface is given as a sine function, and what we guess is a 
flat interface. This is exactly the model which is depicted in Figure [5] In our example, we 
use Model 2 for constructing the true field, followed by identity observations and i.i.d. noise. 

Before actually doing maximum likelihood estimation, we visualise heuristically why it 
may make sense. In Figure [l5j we see a sample of the true model at the top, the true sample 
minus the guessed model in the model, and the true sample minus the blend model with 
optimal range at the bottom. The norm of the bottom figure is less than that of the middle 
one. 



Maximum likelihood estimates for the range is given in Figure 16, where the left figure 
is the range estimates when the guessed interface is a line and the true line is a full-period 
sine with a maximum amplitude of 23 and the right is a half-period sine with maximum 
amplitude 23. These estimates are good in the sense that the range covers the true model as 
in Figure [5] 

A comparison of predictions using the guessed interface with no blend and the one with 
optimal blend is given in Figure [TT| Here we see that the predictions using optimal blend 
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Figure 15: Sample from true model (top), sample from true minus guessed model (middle), sample from 
true model minus blend model with optimal range (bottom). 




Figure 16: Estimated blend range using maximum likelihood for 200 samples. Sine-interface with full 
period (left), sine-interface with half period (right). 
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Figure 17: Performance gain using the blended interface over the guessed interface with sine-interface 
with full period (left) and sine-interface with half period (right) 




range are only marginally better than using the interface with no blend for both interface 
structures. For the blend range, however, better prediction is not the goal. The goal here is 
to get an idea about how uncertain we are about the interface location, and better predictions 
comes as an additional boon, even if the improvement is marginal. 

5 Conclusions and future work 

In this text we have showed three things: First, how it is possible to incorporate information 
about the geometry of the problem flexibly. Secondly, how to facilitate changing covariances 
between elastic parameters depending on position. Lastly, we have introduced a novel way of 
specifying uncertainty related to the position of an interface using the concept of geodesic 
blending based on local correlation of the multivariate field. The first hinges on using SPDEs 
in order to specify local properties of the fields, and the second on how systems of SPDEs 
interrelate depending on position. The geodesic blending approach is based on the smooth 
manifold structure of the set of positive definite matrices. The ideas presented here are not 
limited to the relatively simple models described here - rather, they may be used in any 
spatial inversion problem with a natural geometry where soft constraints based on expert 
opinion may be used. 

Address for corresponding author: 
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Nedre M0llenberggate 70B 
7043 Trondheim 

E-mail: erlend.aune.1983@gmail.com 
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Appendix: Finite difference disretization - the gory de- 
tails 

This appendix is devoted to the finite difference scheme we used for discretizing the elliptic 



operator in (11). We employ a changed notation in this appendix for convenience, replacing 
H with A, and we hope that it is transparent for readers. For a 2-dimensional field with 
a = 1, we have 



au(x,y) a 12 (x,y) \ / u x (x,y) 
a 2 i{x,y) a 2 2(x,y) J \ u y (x,y) 



+ K{x,y)u{x,y) 



a n (^)^ )+K{z,yMz,y) 
=d x (a n u x + a 12 u y ) + d y (a 21 u x + a 22 u y ) + ku 

=a* 1 u x + anu xx + a\ 2 u y + ai 2 % x + + a 2i^ + ^22^2/ + a 22^ 

=diag(A)V • \7u + (ai2 + a 2 i)u xy + a^ x u x + ri\ 2 u y + a| 2 % + a 2i u x (26) 
where a^-,?j = x,y denotes different at ion wrt. x or 1/ of the z, j element of A, depending 



implicitly on the position. To discretize (26), we employ a finite difference scheme. We define 
the following finite difference operators 

^ = \( u l - u l-i), 

where i,j are positions on the grid, with i denoting the x-direction and j denoting the 
^-direction. Now, we define the following operators 



A, 



where 



= S x (a n 5 x u) = S x (yi ~ ^i-i)) 

= h Ki - «0 - ^11 («? - <0) . ( 27 ) 

^ / * ? 7 1 i — 1,7 A 

»ii = 2 + °n ) 

^~ { *?7 1 z , 7 — 1 \ 
«22 = g V a 22 + a 22 J ' 

A equivalent expression holds for A^u. We define o^ 1 = a\{,k = 1,2. For the mixed 
operators we have 



1 



A ty u = 2 ( a i2^^) + h (a^Syu)) (28) 



and we have 



[ ai2^- +1 - u i 



Hence 



S x (o>i25 y u) = ^S x (q 

= ^ - <i) - - <)) 

Sx (ciudyu) = (ai2(Ui - "i -1 )) 

= ( ffl 12( U i - - «12 lj ( u i-l " • 

A > =2^2 Will ~ u Ui) ~ ~ <)) (29) 
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For A+ we reverse the order of the difference operators: 



1 



2 

2h? 



+ (a l iZ( u i ~ u i-i)- a i2 _1 K _1 
And the complete discretisation is 

(A^ + A+ + A+ + K yy )u = f(u, W) 



!>) 



(30) 



In Samarskii et al. ( 2QQ2[ ), it is proved that this scheme is convergent. If we assume that A 
does not vary in space, we can simplify the scheme; 



A* 



AyyU ■ 



1 
1 
1 

ft* 
1 

2h? 



(anK+i - u-) - a n W - <4-i)) 
a 22 K +1 -2^+^- 1 ) 



a i2 (^+ 1 1 



) - a 12 (u 



i+i 



0) 



+ (ai2K *) -ai 2 K_i 



2ft 2 



i+l 



A.y~ X U '■ 



A+ 1 
'i+l 



u 3 ^ 1 ) -a 12 (u J i+1 



0) 



+ (ai2K -«12K 



j-1 



j-1 

^-1 



This corresponds to the following stencil 



# 



-0 

■0 



5 



012 -« 22 - «12 

-an - ai2 2(on + 022 + 012) -011-012 
-a 22 - ai2 0i 2 



(31) 
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